Harnessing Twitter "Big Data" for Automatic Emotion Identification
نویسندگان
چکیده
User generated content on Twitter (produced at an enormous rate of 340 million tweets per day) provides a rich source for gleaning people’s emotions, which is necessary for deeper understanding of people’s behaviors and actions. Extant studies on emotion identification lack comprehensive coverage of “emotional situations” because they use relatively small training datasets. To overcome this bottleneck, we have automatically created a large emotion-labeled dataset (of about 2.5 million tweets) by harnessing emotion-related hashtags available in the tweets. We have applied two different machine learning algorithms for emotion identification, to study the effectiveness of various feature combinations as well as the effect of the size of the training data on the emotion identification task. Our experiments demonstrate that a combination of unigrams, bigrams, sentiment/emotionbearing words, and parts-of-speech information is most effective for gleaning emotions. The highest accuracy (65.57%) is achieved with a training data containing about 2 million tweets.
منابع مشابه
Meta-level sentiment models for big social data analysis
People react to events, topics and entities by expressing their personal opinions and emotions. These reactions can correspond to a wide range of intensities, from very mild to strong. An adequate processing and understanding of these expressions has been the subject of research in several fields, such as business and politics. In this context, Twitter sentiment analysis, which is the task of a...
متن کاملDiscovering Emotions in the Wild: An Inductive Method to Identify Fine-grained Emotion Categories in Tweets
This paper describes a method to expose a set of categories that are representative of the emotions expressed on Twitter inductively from data. The method can be used to expand the range of emotions that automatic classifiers can detect through the identification of fine-grained emotion categories human annotators are capable of detecting in tweets. The inter-annotator reliability statistics fo...
متن کامل2016 Olympic Games on Twitter: Sentiment Analysis of Sports Fans Tweets using Big Data Framework
Big data analytics is one of the most important subjects in computer science. Today, due to the increasing expansion of Web technology, a large amount of data is available to researchers. Extracting information from these data is one of the requirements for many organizations and business centers. In recent years, the massive amount of Twitter's social networking data has become a platform for ...
متن کاملDesign and Test of the Real-time Text mining dashboard for Twitter
One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...
متن کاملHarnessing Hadoop: Understanding the Big Data Processing Options for Optimizing Analytical Workloads
Asserting that data is vital to business is an understatement. Organizations have generated more and more data for years, but struggle to use it effectively. Clearly data has more important uses than ensuring compliance with regulatory requirements. In addition, data is being generated with greater velocity, due to the advent of new pervasive devices (e.g., smartphones, tablets, etc.), social W...
متن کامل